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Abstract. - The performance of "typical set (pairs) decoding" for ensembles of Gallager's 
linear code is investigated using statistical physics. In this decoding, error happens when the 
information transmission is corrupted by an untypical noise or two or more typical sequences 
satisfy the parity check equation provided by the received codeword for which a typical noise is 
added. We show that the average error rate for the latter case over a given code ensemble can 
be tightly evaluated using the replica method, including the sensitivity to the message length. 
Our approach generally improves the existing analysis known in information theory community, 
which was reintroduced by MacKay (1999) and believed as most accurate to date. 



Triggered by active investigations on error correcting codes in both of information theory 
(IT) and statistical physics (SP) communities ||, |l^, |l], g 0, ||, [B), there IS a growing 
interest in the relationship between IT and SP. Since it turned out that the two frameworks 
that have different backgrounds have investigated similar subjects, it is quite natural to expect 
that standard techniques known in one framework bring about remarkable developments in 
the other, and vice versa. 

The purpose of this Letter is to present such an example. More specifically, we will show that 
a method to evaluate the performance of error correcting codes established in IT community 

^, ^ can be generally improved by introducing the replica method. This serves as a direct 
answer to a question from IT researchers why the methods from physics always provide more 
optimistic evaluations than those known in IT literatures. In our formulation, the IT method 
is naturally linked to the existing SP analysis being parametrized by the number of replicas 
p > 0, which clearly explains how the IT and SP methods are related to each other. 
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In a general scenario, the N dimensional Boolean message x E {0, 1}^ is encoded to the 
M{> N) dimensional Boolean vector y", and transmitted via a noisy channel, which is taken 
here to be a Binary Symmetric Channel (BSC) characterized by flip probability p per bit; other 
transmission channels may also be examined within a similar framework. At the other end of 
the channel, the corrupted codeword is decoded utilizing the structured codeword redundancy. 

The error correcting code that we focus on here is Gallager's linear code ||]. This code was 
originally introduced by Gallgar about forty years ago but was almost forgotten soon after 
the proposal due to the technological limitations in those days. However, since the recent 
rediscovery by MacKay and Neal this is now recognized as one of the best codes to date. 

A code of this type is characterized by a randomly generated (M — N) x AI Boolean 
sparse parity check matrix H, composed of K and C (> 3) non-zero (unit) elements per row 
and column, respectively. Encoding the message vector x, is carried out using the M x N 
generating matrix G'^ , satisfying the condition HG^ = 0, where — G^x (mod 2). The M 
bit codeword is transmitted via a noisy channel, BSC in the current analysis; the corrupted 
vector y = y" + n° (mod 2) is received at the other end, where n° G {0, 1}^^ represents 
a noise vector with an independent probability p per bit of having a value 1. Decoding is 
carried out by multiplying y by the parity check matrix H, to obtain the syndrome vector 
z = Hy = H{G^x + 'nP)=H'nP (mod 2), and to find a solution to the parity check equation 

Hn = z (mod 2) , (1) 

for estimating the true noise vector nP. One retrieves the original message using the equation 
G^x = y — n (mod 2); x becomes an estimate of the original message. 

Several schemes can be employed for solving Eq. (yj). In recent years, the maximum a 
posterior (MAP) and the maximizer of posterior marginal (MPM) decodings which correspond 



to zero and the Nishimori's temperatures, respectively, have been widely investigated |2^, 15 
1^. However, we will here evaluate the performance of another scheme termed typical set 
(pairs) decoding, which was pioneered by Shannon [ po| and reintroduced by MacKay for 
analyzing the Gallager-type codes. Although this decoding method is slightly weaker to reduce 
the block or bit error rates, a rigorous analysis becomes easier than those for the above two 
decoding methods and investigation on it is now becoming popular in IT community ^ ^ . 

In order to argue the typical set decoding, we first introduce the definition of typical. Due 
to the law of large number, a noise vector n generated by the BSC satisfies a condition 



M 

M ^ 

1=1 



M 

'ni~p 



< EAf , (2) 



with a high probability for large M and any sequence of positive number eM ~ 0{M~'') (0 < 
7 < 1/2). We define that a vector n is classified as typical when this condition is satisfied. We 
also call the set of all typical vectors the typical set. 

Then, one can define the typical set decoding as a scheme to select a vector n that belongs 
to the typical set and satisfies Eq. (|l|), as an estimate of the true noise n°. In the case that 
there are two or more typical vectors satisfy Eq. (|^) , it is decided that an error is automatically 
declared For this scheme, there can happen two types of decoding error; the first possibility, 
referred to the type I error here, takes place when the true noise nP is not typical, while the 
other one, termed the type II error, is declared when there are two or more typical vectors that 
satisfy Eq. (|l|) in spite that the true noise nP is typical. It can be shown that the probability 
for the type I error, Pj, vanishes in the limit M — > oo. Therefore, we will here focus on the 
evaluation of the probability for the type II error, P//. 

To proceed further, it is convenient to employ the binary expression for bit sequences 
rather than Boolean one utilizing a mapping {0, 1, +} {+1, —1, x}. This makes it possible 
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to introduce the error indicator function that becomes one when error happens and zero, 
otherwise, as 



A(nO,i/) =.^Um^V^^(nO,i7), (3) 



where 



M-N I \ / M \ 

VNF{n°,H) = ^Tr^^ H 11 11 U " ^ 

^"^^ M=i \iec{yi.) lec(ti) J \i=i ) 

= Tr n ^' n M ( XI ~ MtanhF j , 



(4) 



where we have introduced the gauge transform n; n^ni in the last form of Eq. for further 
convenience, and where 1 denotes the M dimensional vector all the elements of which are 1. 
Eq. (Q) denotes the number of vectors that differs from nP in the intersection of the typical 
set and the solution space of Eq. (|l]). The field F = (1/2) In [(1 — p)/p] and represents 
the level of the channel noise and the set of indices that have non-zero elements in /x th row 
in the parity check matrix H, respectively. 

From the definition, the probability of the type II error for a given matrix H is given as 

PiiiH) - (A(nO,i/)<5 (Efii"?-MtanhF))^^, where (• • •)„o = Tr„o (• • •) expEf£i n"]/ 

(2 cosh i^)^^. Since the parity check matrix H is generated somewhat randomly, it is natural 
to evaluate the average of Pii{H) over an ensemble of codes for given parameters K and C 
as a performance measure for the code ensemble. Employing Eq. (|^), the average is given as 
Pji = limp^_|_oexp [— M£(p)], where 



£(p) = -— In V^^(n",//)5 Vn°-MtanhF , (5) 





for large AI. Here, (• ■ represents an average over the uniform distribution of the parity 
check matrix for a given choice of parameters K and C. 

Before proceeding any further, it is worthy of mentioning general properties of the exponent 
£{p)- First, Pji is expected to vanish in the limit M ^ cx) for a sufficiently small noise p. 
The highest noise level Pc for this is termed the error threshold ||l|. This happens when 
£{0) = Hm^^+o £ip) > 0. The value of £{0) > represents the sensitivity of Pu to the 
message length and serves as a performance measure of the code ensemble when M is finite. 
Next, since VNF{n'^ , H) = 0, 1,2, . . ., V^p{'n'^ , H) increases with respect to p, which implies 
the exponent £{p) becomes a decreasing function of p > 0. This is linked to an inequality 



dp M 



SNF{n°,H)V'^pin'',H)S (^fii - MtanhF 



VPfp{nO,H)6 " MtanhF 



< 0, (6) 



where 5Ari?(n'', H) = In VAr_F(n*', H) is the entropy representing the number of wrong solutions 
for Eq. (|l|) belonging to the typical set. One can also show that d^£{p) / dp^ < 0, which implies 
£{p) is a convex function of p. 

We are now ready to connect the current argument to the existing analysis of the typical set 
decoding |2^, ^, |^. Since £{0) > £{l), one can obtain a lower bound of pc from the condition 

£{l) — 0. For p=l in Eq. (^), it is convenient to insert an identity 1 = / div 6 (^f^i ni — Mu?j 
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in the final form of Eq, 
tains (rin S 



Then, for a sequence n that satisfies {l/M) X]f=i = one ob- 

Af tanhF^ S (j2fii - MtanliF^ ^ ^ - exp [-M/C(w, F)], where 

)+( 1^) hi (tanh i^^) and = - ^ In _ in ii^ . 

Further, the remaining average required in Eq. (||) is evaluated as {Trn S (X]i=i — A/tj) 

('"^ "0 )// '^'^^ ^^^'^ ^^^^ ' '^'^'^ exponent 7^ (cj) is termed the weight enu- 
merator ^ . This provides an averaged distribution of the distances between the true noise 
and other vectors that satisfy Eq. (|l|) in the current context Q}, and plays an important 
role for evaluating a performance of codes in conventional coding theory [l0[| . From the above, 
one obtains £{1) = Ext^^^i \IC{lo,F) — TZ{u!)}, where Ext{...j denotes an extremization. This 
corresponds to Eq. (4.7) in 

However, it should be emphasized here that one can evaluate £{1) without introducing 
the weight variable cu. Moreover, it is evident that the tightest estimate (exact value) of pc 
can be obtained by evaluating £{0) = limp^+o f (p). This can be carried out by the replica 

method, which gives rise to a set of order parameters (7a,/3,...,7 = (1/-^^) X]f=i ■^'"■""'f • ■ ■ "^Z' 
where a,/3, . . . represent replica indices and the variable 2^i=i....,Af comes from enforcing the 
restriction C connections per index as in ||^. 

Further calculation requires a certain ansatz about the symmetry of the order parameters. 
As a first approximation we assume replica symmetry (RS) in the following order parameters 
and their conjugate variables 



2 tanh F 



dx 7r(x) x'. 



dx n (x)ir'. 



(7) 



where I denotes the number of replica indices, q and g are normalization variables for defining 
7r(-) and ??(•) as distributions. Unspecified integrals are carried out over the interval [—1, 1]. 

Originally, the summation Tr^_^2(-) excludes the case n = 1; but one can show that for 
large M limit, this becomes identical to the full summation in the non-ferromagnetic phase, 
where 7r(x) 6{x — l) and Tr{x) ^ 6{x — 1). In addition, we employ Morita's scheme which 
in this case converts the restricted annealed average with respect to ti° to a quenched one: 



— In/ (•■•) X ^ f 



M tanh F 



M 



(ln(---))n°' 



(8) 



to simplify the calculation of the average over in Eq. 

[ Cq^ 



considerably, and obtain 



Sip) = 



Ext* 

{g,g,7r(-),^(-),G} 
C 



K 



ff[dxM^.) 



In 



Clnq+Cqq / dx dx 'k{x) Tf{x) 



1 + Xn n 




p G tanh F 



(9) 



where ((••■))nf) — Tr„o^±i(---) and Ext|...j denotes the functional extremization excluding 
the possibility of 7r(a;) — 6{x — 1) and t:(x) = 6{x — 1) as is introduced in jsj. 



{}) The weight enumerator is usually introduced for the distance between codewords 0, |icl]. 
However, since y° — = nP — (mod 2) holds for two sets of Boolean vectors (i/", n") ana (y , n ) 
that satisfy y — + 'nP = + (mod 2), the distance between the noise vectors ■nP and is 
identical to that for the codewords y" and y^. 
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Two analytical solutions of tt{x) and 7r(x) can be obtained in the limit K,C —^ oo, keeping 
the code rate R = N/M = 1 - C/K finite: 

1. 7r(a;) = i[(l + tanhi^)(5(x - tanhi^) + (f - tanhi^)(5(x + tanhi^)], n{x) = S{x) 

2. 7r(a;) = i[5(x-f) + 5(x + f)], n(x) ^ ^[6ix - 1) + 6{x + 1)]. 

One can show that both of these are locally stable against perturbations to the RS solutions 
providing £{p) = p [7?(tanhF) - In2] and Sip) = i?(tanhi^)-(l-i?) In2, respectively. 

Selecting the relevant branch that has the lower exponent for p > I and taking the limit p — > 
one obtains the exponent as 

5(0)=^hm^£(,) = { (10) 

where Rc = I +plog2P+ (1 — p)log2(l — p) corresponds to Shannon's limit pof . 

Note that in the vicinity of i? ~ this exponent exceeds the upper bound of possible 
reliability function that represents the vanishing rate of the decoding error probability for the 
best code p[. However, this does not imply any contradiction because the current analysis 
is just for Pji while the convergence rate of Pi is slower than that of the reliability function. 

For finite K and C, one can obtain £{p) via numerical methods. Similar to the case of 
K,C —>■ oo, there generally appear two branches of solutions: 

1. continuous distributions for Tr(x) and Tf{x), for which limp^+o ^(p) = 0. 

2. p independent frozen distributions 7r(x) — ^[(1 + b) d{x — 1) + (1 — 6) S(x + 1)], 

^ + b) 6{x - 1) + (1 -S) Six + 1)]. 

The parameters b and b are determined from the extremization problem (j^) by setting p — 1, 
the functional extremization with respect to 7r(-) and 7f(-) is then reduced to that for the first 
moments b = J dxnix) and b = J dxTT{x). The exponent of this branch is completely frozen 
to that for p = 1 as £{p) = £{1) for Vp > 0. Although the distributions of the two branches 
look quite different, their exponents coincide at p = 1 in any situation. 

Note that the frozen branch corresponds to the conventional IT analysis H, and would 
provide the exact estimate in absence of other solutions. However, in order to take an 
appropriate limit limp^+o £{p)-, one has to select the dominant branch for p > 1 § among the 
existing solutions, and the frozen branch does not necessarily provide the correct exponent for 
p +0. Actually, the scenario suggested by our analysis supports this statement (Fig. |l|). 

When the channel noise p is sufficiently high (Fig. |^ (a)), the exponent for the continuous 
branch is monotonically decreasing with respect to p which implies this is the dominant 
branch for p > 1. This provides limp^+o^(p) — 0. However, for lower p, £{p) of the 
continuous branch is maximized to a positive value for a certain parameter pg(Fig.Jl^ (b)). 
In this situation, the solution for < p < pg is physically wrong because inequality (pp does 
not hold. The frozen replica symmetry breaking (RSB) solution Q (a one step RSB ansatz 
under the constraint {\/M)n°' • n'' = 1 for replica indices a and b in the same subgroup) is a 
suitable scheme for obtaining a consistent solution. Employing this IRSB solution, one finds 
£{p) = £{pg) for < p < pg, which implies limp^+o^(p) = £{Pg) > indicating a vanishing 
behaviour Pjj ~ exp [—AI£{pg)]. These imply that the critical condition determining the error 
threshold pc is given by d£{p)/dp\p^_^_Q = 0, being computed for the continuous solution. 
Employing the gauge transform [|l5j , one can show that the variational parameter G in Eq. (|^) 
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Fig. 1. - Appropriate limits for linip^+o £{p) in the case of finite JCand C. The solution that has the 
lower exponent for p > 1 should be selected as the relevant branch ||5| , which is drawn as a thick curve 
or line in each case. For p > Pc (a), the continuous solution is relevant while the l(frozen)RSB solution 
which emerges from this solution at p = pg provides an appropriate exponent £{pg) for Pb < p < Pc 
(b). For < p < Pi, (c), the frozen (RS) solution is relevant. In the limit K, C ~* oo, the situation 
(b) does not appear. 

(a) (b) 
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Fig. 2. - (a): Numerically computed £{p) of the continuous branch for p = 0.0915, 0.0990 for K = 6 
and C = 3 {R = 1/2). Symbols and error bars are obtained from 50 numerical solutions. Curves are 
computed via a quadratic fit. For p = 0.0915, £{p) is maximized to a positive value £{pg) — 2.5 x 10~^ 
for pg ~ 0.5 while it vanishes at p ~ 1 as is suggested in the IT literature j^. On the other hand, for 
p — 0.0990, our predicted threshold, it is maximized to zero at p ~ 0, which implies that this is the 
correct threshold, (b): Comparison of the estimates of Pc between the IT and the current methods is 
summarized in a table. The estimates for the IT method are taken from 111] . The numerical precision 
is up to the last digit for the current method. Shannon's limit denotes the highest possible pc for a 
given code rate. 



enforcing X]f=i = Mtanhi^ coincides with F in this limit. Then, the critical condition 
is summarized as 



FtanhF ( ( In 



AI-N 

Tr TT , 

"^1 M 




r } n, 711 



= 0, 



(11) 



H' nO 



which is identical to what has been obtained for the phase boundary of the ferro-paramagnetic 
transition along the Nishimori's temperature predicted by the existing replica analysis |^. 

As p is reduced further, the position of the maximum pg moves to the right and exceeds 
p = 1 at another critical noise rate pi,. This implies that below pf, the limit p —>■ +0 is governed 
by the frozen (RS) solution that is identical to what is given by the conventional IT analysis 
(Fig. |l| (c)). However, this situation is realized only sufficiently below from the threshold and 
the solution is of no use for direct evaluation of pc although it provides a lower bound. 
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Finally, we examined the case oi K = Q and C = 3 to demonstrate the accuracy of the 
estimated threshold. We numerically evaluated £{p) of the continuous branch for p — 0.0915, 
a recent highly accurate estimate of the error threshold for this parameter choice Q and for 
p = 0.0990, which is the threshold predicted by the replica method The numerical 

results are obtained by approximating 7r(-) and tt (•) using 10^ dimensional vectors and iterating 
the saddle point equations until convergence. The obtained results are shown in Fig. |2| (a); it 
indicates maxpf (p) ~ 2.5 x 10^^ for p — 0.0915 while £{p) is maximized (to zero) at p ~ for 
p = 0.0990, suggesting a tighter estimate for the error threshold than those reported so far. 
Comparison in other parameter choices is also summarized in Fig. ^ (b) . 

In summary, we have investigated the performance of the typical set decoding for ensembles 
of Gallager's codes. We have shown that the direct evaluation of the average type II error 
probability over the ensemble becomes possible employing the replica method. The link to the 
existing IT analysis which is based on the weight enumerator is also clarified. Although the 
weight enumerator does not play a crucial role for determination of the error threshold in the 
current analysis, it still remains a key factor for the error rate in low R regions. Analysis of it 
from a view point of statistical physics is under way fl^ . 
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of the JSPS (YK), EPSRC (GR/N00562) and The Royal Society (JVM). David Saad is 
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